Goto

Collaborating Authors

 positive reward






HIVEX: A High-Impact Environment Suite for Multi-Agent Research (extended version)

arXiv.org Artificial Intelligence

Games have been vital test beds for the rapid development of Agent-based research. Remarkable progress has been achieved in the past, but it is unclear if the findings equip for real-world problems. While pressure grows, some of the most critical ecological challenges can find mitigation and prevention solutions through technology and its applications. Most real-world domains include multi-agent scenarios and require machine-machine and human-machine collaboration. Open-source environments have not advanced and are often toy scenarios, too abstract or not suitable for multi-agent research. By mimicking real-world problems and increasing the complexity of environments, we hope to advance state-of-the-art multi-agent research and inspire researchers to work on immediate real-world problems. Here, we present HIVEX, an environment suite to benchmark multi-agent research focusing on ecological challenges. HIVEX includes the following environments: Wind Farm Control, Wildfire Resource Management, Drone-Based Reforestation, Ocean Plastic Collection, and Aerial Wildfire Suppression. We provide environments, training examples, and baselines for the main and sub-tasks. All trained models resulting from the experiments of this work are hosted on Hugging Face. We also provide a leaderboard on Hugging Face and encourage the community to submit models trained on our environment suite.


Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

arXiv.org Artificial Intelligence

Traditional reinforcement-learning-based agents rely on sparse rewards that often only use binary values to indicate task completion or failure. The challenge in exploration efficiency makes it difficult to effectively learn complex tasks in Minecraft. To address this, this paper introduces an advanced learning system, named Auto MC-Reward, that leverages Large Language Models (LLMs) to automatically design dense reward functions, thereby enhancing the learning efficiency. Auto MC-Reward consists of three important components: Reward Designer, Reward Critic, and Trajectory Analyzer. Given the environment information and task descriptions, the Reward Designer first design the reward function by coding an executable Python function with predefined observation inputs. Then, our Reward Critic will be responsible for verifying the code, checking whether the code is self-consistent and free of syntax and semantic errors. Further, the Trajectory Analyzer summarizes possible failure causes and provides refinement suggestions according to collected trajectories. In the next round, Reward Designer will take further refine and iterate the dense reward function based on feedback. Experiments demonstrate a significant improvement in the success rate and learning efficiency of our agents in complex tasks in Minecraft, such as obtaining diamond with the efficient ability to avoid lava, and efficiently explore trees and animals that are sparse on the plains biome.


Experiential Explanations for Reinforcement Learning

arXiv.org Artificial Intelligence

Reinforcement Learning (RL) systems can be complex and non-interpretable, making it challenging for non-AI experts to understand or intervene in their decisions. This is due in part to the sequential nature of RL in which actions are chosen because of future rewards. However, RL agents discard the qualitative features of their training, making it difficult to recover user-understandable information for "why" an action is chosen. We propose a technique, Experiential Explanations, to generate counterfactual explanations by training influence predictors along with the RL policy. Influence predictors are models that learn how sources of reward affect the agent in different states, thus restoring information about how the policy reflects the environment. A human evaluation study revealed that participants presented with experiential explanations were better able to correctly guess what an agent would do than those presented with other standard types of explanation. Participants also found that experiential explanations are more understandable, satisfying, complete, useful, and accurate. The qualitative analysis provides insights into the factors of experiential explanations that are most useful.


Reward Shaping for Happier Autonomous Cyber Security Agents

arXiv.org Artificial Intelligence

As machine learning models become more capable, they have exhibited increased potential in solving complex tasks. One of the most promising directions uses deep reinforcement learning to train autonomous agents in computer network defense tasks. This work studies the impact of the reward signal that is provided to the agents when training for this task. Due to the nature of cybersecurity tasks, the reward signal is typically 1) in the form of penalties (e.g., when a compromise occurs), and 2) distributed sparsely across each defense episode. Such reward characteristics are atypical of classic reinforcement learning tasks where the agent is regularly rewarded for progress (cf. to getting occasionally penalized for failures). We investigate reward shaping techniques that could bridge this gap so as to enable agents to train more sample-efficiently and potentially converge to a better performance. We first show that deep reinforcement learning algorithms are sensitive to the magnitude of the penalties and their relative size. Then, we combine penalties with positive external rewards and study their effect compared to penalty-only training. Finally, we evaluate intrinsic curiosity as an internal positive reward mechanism and discuss why it might not be as advantageous for high-level network monitoring tasks.


Uniswap Liquidity Provision: An Online Learning Approach

arXiv.org Artificial Intelligence

Decentralized Exchanges (DEXs) are new types of marketplaces leveraging Blockchain technology. They allow users to trade assets with Automatic Market Makers (AMM), using funds provided by liquidity providers, removing the need for order books. One such DEX, Uniswap v3, allows liquidity providers to allocate funds more efficiently by specifying an active price interval for their funds. This introduces the problem of finding an optimal strategy for choosing price intervals. We formalize this problem as an online learning problem with non-stochastic rewards. We use regret-minimization methods to show a liquidity provision strategy that guarantees a lower bound on the reward. This is true even for non-stochastic changes to asset pricing, and we express this bound in terms of the trading volume.


Explainable Reinforcement Learning on Financial Stock Trading using SHAP

arXiv.org Artificial Intelligence

Explainable Artificial Intelligence (XAI) research gained prominence in recent years in response to the demand for greater transparency and trust in AI from the user communities. This is especially critical because AI is adopted in sensitive fields such as finance, medicine etc., where implications for society, ethics, and safety are immense. Following thorough systematic evaluations, work in XAI has primarily focused on Machine Learning (ML) for categorization, decision, or action. To the best of our knowledge, no work is reported that offers an Explainable Reinforcement Learning (XRL) method for trading financial stocks. In this paper, we proposed to employ SHapley Additive exPlanation (SHAP) on a popular deep reinforcement learning architecture viz., deep Q network (DQN) to explain an action of an agent at a given instance in financial stock trading. To demonstrate the effectiveness of our method, we tested it on two popular datasets namely, SENSEX and DJIA, and reported the results.